Introduction

The aim of this assignment was to demonstrate the use of basic stastical plots in R.

The data that was chosen was Uber API data. The start latitude and longitude along with the end Latitude and Longitude was sent to the Uber API. In return UBER API responded with the following fields 1) Type of UBER’s available (Ranging from UBER X, Pool to UBER Lux) 2) The Highest cost estimate 3) The Lowest cost estimate 4) The average cost range estimate 5) The distance between the two points 6) The currency of the location 7) The language of one of the cabs (Espanol)

1)Basic Stastical Plots that were included were:

Scatterplot (Uber name vs highest cost estimate) Text (Uber name vs highest cost estimate) Bar chart (Uber name vs highest cost estimate) Line chart (Date vs Probable Cost) Area chart (Date vs Probable Cost) Dot plot (highest cost estimate) Histogram (highest cost estimate) Frequency polygon (highest cost estimate) Box plot (Uber name vs Extrapolated cost estimate (eg: 8-10 became 8,9,10)) Violin plot (Uber name vs Extrapolated cost estimate (eg: 8-10 became 8,9,10))

  1. A faceted plot (Uber name - highest cost estimate - lowest cost estimate)

  2. A ggmap plot (Start Location and End Location)

4)Plotly bar plot (Uber name vs highest cost estimate)

The use of ggplot2, ggmap and plotly was demonstrated in the R studios. A copy of the notebook was published at Rpubs (Link: https://rpubs.com/AnmolChawla/assignment10 ) An R project with a seperate data folder was created and incremnetal git commits were made to ensure good practice. Notebook was properly formatted with R Markdown

# Basic Plots
df <- read.csv("C:/Users/anmol/Desktop/INF 554/Homework 10/r_assignement/data/uber.csv")
mydata <- read.csv("C:/Users/anmol/Desktop/INF 554/Homework 10/r_assignement/data/uber_data1.csv")
mydata['date']<-as.Date(mydata$date)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(plotly)
## Warning: package 'plotly' was built under R version 3.4.4
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggmap)
## Warning: package 'ggmap' was built under R version 3.4.4
## 
## Attaching package: 'ggmap'
## The following object is masked from 'package:plotly':
## 
##     wind
df
##    ï..localized_display_name distance display_name
## 1                     UberXL     1.59       UberXL
## 2                  Black SUV     1.59    Black SUV
## 3                       Pool     1.59         Pool
## 4                      UberX     1.59        UberX
## 5                    Espanol     1.59      Espanol
## 6                     Select     1.59       Select
## 7                      Black     1.59        Black
## 8                     Assist     1.59       Assist
## 9                        WAV     1.59          WAV
## 10                       Lux     1.59          Lux
##                              product_id high_estimate low_estimate
## 1  9502f87d-e0d0-488d-b84f-b8537538c339            13           10
## 2  16ecc8ec-7fe5-4c5f-9c68-ff9c696f7d5f            30           23
## 3  e61cd68e-bf67-405e-94c1-4993017c6afe            11            7
## 4  2143f90b-ce68-4f6d-a113-4872b207e626            11            8
## 5  1c770649-4755-45d0-b1de-e8cc6e8639bb            11            8
## 6  3bbfad48-dd77-45dc-9bb0-593821f1a5dd            17           13
## 7  e4578b16-6714-4cba-a131-f8cb56ad4555            20           15
## 8  876a0e0f-e232-4131-a6e2-355db8045030            11            8
## 9  aef7503f-8ab9-4470-8ed9-63d797fa721e            11            8
## 10 dee3ccd0-1736-4397-8799-53eb21ffe92e            36           29
##    duration estimate currency_code
## 1       480   $10-13           USD
## 2       480   $23-30           USD
## 3       480    $7-10           USD
## 4       480    $8-11           USD
## 5       480    $8-11           USD
## 6       480   $13-17           USD
## 7       480   $15-20           USD
## 8       480    $8-11           USD
## 9       480    $8-11           USD
## 10      480   $29-36           USD
mydata
##    ï..serial      name estimate       date  X
## 1          1    UberXL       10 2018-10-01 NA
## 2          2    UberXL       11 2018-10-02 NA
## 3          3    UberXL       12 2018-10-03 NA
## 4          4    UberXL       13 2018-10-04 NA
## 5          5    UberXL       14 2018-10-05 NA
## 6          6 Black SUV       23 2018-10-06 NA
## 7          7 Black SUV       24 2018-10-07 NA
## 8          8 Black SUV       25 2018-10-08 NA
## 9          9 Black SUV       26 2018-10-09 NA
## 10        10 Black SUV       27 2018-10-10 NA
## 11        11      Pool        7 2018-10-11 NA
## 12        12      Pool        8 2018-10-12 NA
## 13        13      Pool        9 2018-10-13 NA
## 14        14      Pool       10 2018-10-14 NA
## 15        15      Pool       11 2018-10-15 NA
## 16        16     UberX        8 2018-10-16 NA
## 17        17     UberX        9 2018-10-17 NA
## 18        18     UberX       10 2018-10-18 NA
## 19        19     UberX       11 2018-10-19 NA
## 20        20     UberX       12 2018-10-20 NA
## 21        21  Español        8 2018-10-21 NA
## 22        22  Español        9 2018-10-22 NA
## 23        23  Español       10 2018-10-23 NA
## 24        24  Español       11 2018-10-24 NA
## 25        25  Español       12 2018-10-25 NA
## 26        26    Select       13 2018-10-26 NA
## 27        27    Select       14 2018-10-27 NA
## 28        28    Select       15 2018-10-28 NA
## 29        29    Select       16 2018-10-29 NA
## 30        30    Select       17 2018-10-30 NA
## 31        31     Black       15 2018-10-31 NA
## 32        32     Black       16 2018-11-01 NA
## 33        33     Black       17 2018-11-02 NA
## 34        34     Black       18 2018-11-03 NA
## 35        35     Black       19 2018-11-04 NA
## 36        36    Assist        8 2018-11-05 NA
## 37        37    Assist        9 2018-11-06 NA
## 38        38    Assist       10 2018-11-07 NA
## 39        39    Assist       11 2018-11-08 NA
## 40        40    Assist       12 2018-11-09 NA
## 41        41       WAV        8 2018-11-10 NA
## 42        42       WAV        9 2018-11-11 NA
## 43        43       WAV       10 2018-11-12 NA
## 44        44       WAV       11 2018-11-13 NA
## 45        45       WAV       12 2018-11-14 NA
## 46        46       Lux       29 2018-11-15 NA
## 47        47       Lux       30 2018-11-16 NA
## 48        48       Lux       31 2018-11-17 NA
## 49        49       Lux       32 2018-11-18 NA
## 50        50       Lux       33 2018-11-19 NA

Basic Plots

#Scatter Plot
#Display_name = The name that get's displayed to the USer on the Uber App
#High_ estimte = The Hoghest Cost estimate that the user can be charged
ggplot(df, aes(display_name, high_estimate)) + geom_point()

#text
#Display_name = The name that get's displayed to the USer on the Uber App
#High_ estimte = The Hoghest Cost estimate that the user can be charged
#The test shows the lowest to highest cost estimate for that ride.
ggplot(df, aes(display_name, high_estimate)) + geom_text(aes(label = estimate))

#bar chart
#Display_name = The name that get's displayed to the USer on the Uber App
#High_ estimte = The Hoghest Cost estimate that the user can be charged
ggplot(df, aes(display_name,high_estimate)) + geom_bar(stat = "identity")  #count of models by manufacturer (rows)

#Line Chart
#Use Case - A rider decides to travel on all options available on an uber for five days each making it fifty days worth of travel. 
#date= The date of the travel
#estimte = The approximate cost that he had to pay. 
ggplot(mydata, aes(date, estimate)) + geom_line()

#Area Chart
#Use Case - A rider decides to travel on all options available on an uber for five days each making it fifty days worth of travel. 
#date= The date of the travel
#estimte = The approximate cost that he had to pay.
ggplot(mydata, aes(date, estimate)) + geom_area()

#Dot Plot
#high_estimte = The apprimate high costs available on the app.
#Count - How many time that High cost occurs
ggplot(df, aes(x = high_estimate)) + geom_dotplot(binwidth = 1)

#Histogram
#high_estimte = The apprimate high costs available on the app.
#Count - How many time that High cost occurs
ggplot(df, aes(x = high_estimate)) + geom_histogram(binwidth = 1)

#Frequency Polygon
#high_estimte = The apprimate high costs available on the app.
#Count - How many time that High cost occurs
ggplot(df, aes(x = high_estimate)) + geom_freqpoly(color = "blue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Box Plot
#name: The name of the option displayed on the app
#Estimate: The cost associated with that option. Range from lowest to highest cost estimate.
ggplot(mydata, aes(name, estimate)) + geom_boxplot()

#Violin Plot
#name: The name of the option displayed on the app
#Estimate: The cost associated with that option. Range from lowest to highest cost estimate.
#The shape remains the same as the data is evenly divided at a step of one and also with the same frequeny. 
ggplot(mydata, aes(name, estimate)) + geom_violin()

Faceted Plot

#name: The name of the option displayed on the app
#low_estimate: The lowest cost associated with that option.
#high_estimate: The highest cost associated with that option.
ggplot(df, aes(high_estimate, low_estimate , color = display_name)) +
  geom_point() +
  facet_grid(cols=vars(display_name))+
  theme(legend.position="none")

MAP

# GGMAP
#circle: Represents the start point, denoted at the start point lat and long UCLA
#rectangle: Represents the end point, denoted at the end point lat and long USC
bb <- c(left = -125.39, bottom = 31.0, right = -113.5, top = 42.0)
stamenmap.ca <- get_stamenmap(bbox = bb, zoom = 6, maptype = "toner")
## Map from URL : http://tile.stamen.com/toner/6/9/23.png
## Map from URL : http://tile.stamen.com/toner/6/10/23.png
## Map from URL : http://tile.stamen.com/toner/6/11/23.png
## Map from URL : http://tile.stamen.com/toner/6/9/24.png
## Map from URL : http://tile.stamen.com/toner/6/10/24.png
## Map from URL : http://tile.stamen.com/toner/6/11/24.png
## Map from URL : http://tile.stamen.com/toner/6/9/25.png
## Map from URL : http://tile.stamen.com/toner/6/10/25.png
## Map from URL : http://tile.stamen.com/toner/6/11/25.png
## Map from URL : http://tile.stamen.com/toner/6/9/26.png
## Map from URL : http://tile.stamen.com/toner/6/10/26.png
## Map from URL : http://tile.stamen.com/toner/6/11/26.png
USC <- data.frame(label = "USC", lon = -120, lat = 35)
UCLA <- data.frame(label = "UCLA", lon = -118.44, lat = 34.07)
ggmap(stamenmap.ca) + geom_point(data = USC, aes(x = -120, y = 35), color="red", size=5, alpha=.5) + geom_point(data = UCLA, aes(x = -118, y = 34), shape = 17, color="red", size=5, alpha=.5)

Interactive Bar Plot

f <- list(
  family = "Courier New, monospace",
  size = 18,
  color = "#7f7f7f"
)
x <- list(
  title = "Uber Options",
  titlefont = f
)
y <- list(
  title = "Highest Cost Estimate",
  titlefont = f
)
p <- plot_ly( x = df$display_name,y = c(df$high_estimate),name = "Uber",type = "bar") %>%
  layout(xaxis = x, yaxis = y)

p
## Warning: package 'bindrcpp' was built under R version 3.4.4